SplaSH (spoken language search hawk): integrating time-aligned with text-aligned annotations

نویسندگان

  • Sara Romano
  • Elvio Cecere
  • Francesco Cutugno
چکیده

In this work we present SpLaSH (Spoken Language Search Hawk), a toolkit used to perform complex queries on spoken language corpora. In SpLaSH, tools for the integration of time aligned annotations (TMA), by means of annotation graphs, with text aligned ones (TXA), by means of generic XML files, are provided. SpLaSH imposes a very limited number of constraints to the data model design, allowing the integration of annotations developed separately within the same dataset and without any relative dependency. It also provides a GUI allowing three types of queries: simple query on TXA or TMA structures, sequence query on TMA structure and cross query on both TXA and TMA integrated structures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Features in Spoken Language Search Hawk (SpLaSH): Query Language and Query Sequence

In this work we present further development of the SpLaSH (Spoken Language Search Hawk) project. SpLaSH implements a data model for annotated speech corpora integrated with textual markup (i.e. POS tagging, syntax, pragmatics) including a toolkit used to perform complex queries across speech and text labels. The integration of time aligned annotations (TMA), represented making use of Annotation...

متن کامل

Multi-level Annotation of Speech: An Overview of The Emu Speech Database Management System

Researchers in various fields, from acoustic phonetics to child language development, rely on digitised collections of spoken language data as raw material for research. Access to this data has, in the past, been provided in an ad-hoc manner with labelling standards and software tools developed to serve only one or two projects. A few attempts have been made at providing generalised access to s...

متن کامل

Multi-level annotation in the Emu speech database management system

Researchers in various ®elds, from acoustic phonetics to child language development, rely on digitised collections of spoken language data as raw material for research. Access to this data had, in the past, been provided in an ad-hoc manner with labelling standards and software tools developed to serve only one or two projects. A few attempts have been made at providing generalised access to sp...

متن کامل

The TASX-environment: an XML-based toolset for time aligned speech corpora

This paper describes the design and implementation of an XML-based corpus environment for multi-tier annotated speech data. The TASX-environment (TASX: Time Aligned Signal data eXchange format) constitutes the technical basis for a corpus designed to explore the acquisition of prosody by second language learners. It supports all aspects of the corpus setup procedure: XML-based annotation of the...

متن کامل

Translation as Annotation

In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotations can be automatically derived...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009